Portable and architecture independent parallel performance tuning using a call-graph profiling tool
نویسندگان
چکیده
This paper describes a post-mortem call-graph profiling tool that analyses trace information generated during the execution of BSPlib programs. The purpose of the tool is to expose imbalance in either computation or communication, and to highlight portions of code that are amenable to improvement. Unlike other profiling tools, the profile information guides optim-isation in an architecture independent way. From an ease of use perspective, the amount of information displayed when visualising a profile f o r a parallel program is no more complex than that of a sequential program. 1 Introduction The role of a profiling tool is to associate computational bottlenecks that arise during program execution with easily identifiable segments of the source code. The usefulness of a profiling tool depends upon the ease in which users can employ this information to alleviate identified bottlenecks within their programs. The success of profiling tools in sequential languages has been predominantly based on the employment of three criteria as the platform on which profiling tools are built. The first of these criteria is 'what' is measured; typically this might be the percentage of execution time spent in each part of the program. The second criteria is 'where' in the code these costs should be attributed; costs may be associated with functions or libraries for example. The third criteria is 'how-to-use' the profiling information to optimise programs in a quantifiable and portable way; for example, problematic portions of code may be rewritten using an algorithm with improved asymptotic complexity. The difference between profiling parallel programs as opposed to sequential programs is that parallel programs are executed on a number of processors. Consequently , each part of the code may be associated with up-to p costs, where p is the number of processors. The major challenge for the developers of profiling tools for parallel languages is to identify and expose the relationship (imbalance) of computational costs amongst processors, and subsequently express this relationship in terms of the three criteria outlined above. Unfortunately, within a parallel framework, there is a multiplicity of interacting issues that make these criteria significantly more obscure and complex: What-to-cost In parallel programming there are at least two kinds of cost which can cause bottlenecks within programs, computation and communication. These costs should not be decoupled and profiled independently as it is of paramount importance that the interaction between the two is identified and exposed to the user. The motivation …
منابع مشابه
Programming Research Group PORTABLE AND ARCHITECTURE INDEPENDENT PARALLEL PERFORMANCE TUNING USING A CALL-GRAPH PROFILING TOOL: A CASE STUDY IN OPTIMISING SQL
This paper describes a post-mortem call-graph pro ling tool that analyses trace information generated during the execution of BSPlib programs. The purpose of the tool is to expose imbalance in either computation or communication, and to highlight portions of code that are amenable to improvement. One of the major bene ts of this tool is that the amount of information displayed when visualising ...
متن کاملAnalysing an SQL Application with a BSPlib Call-Graph Profiling Tool
This paper illustrates the use of a post-mortem call-graph profiling tool in the analysis of an SQL query processing application written using BSPlib [4]. Unlike other parallel profiling tools, the architecture independent metric of imbalance in size of communicated data is used to guide program optimisation. We show that by using this metric, BSPlib programs can be optimised in a portable and ...
متن کاملPortable and architecture independent parallel performance tuningusing a call - graph pro ling
This paper describes a post-mortem call-graph pro-ling tool that analyses trace information generated during the execution of BSPlib programs. The purpose of the tool is to expose imbalance in either computation or communication, and to highlight portions of code that are amenable to improvement. Unlike other prooling tools, the proole information guides optim-isation in an architecture indepen...
متن کاملRuntime Support for Asynchronous Simulation
We present library and runtime support for portable asynchronous applications, using event-driven simulation as an example. Although event-driven simulation has a natural source of parallelism between the simulated entities, real speedups have been hard to obtain because of the ne-grained, unpredictable communication patterns. Language and systems software support is also lacking for asyn-chron...
متن کاملAn Empirical Study of Java Dynamic Call Graph Extractors
A dynamic call graph is the invocation relation that represents a specific set of runtime executions of a program. Dynamic call graph extraction is a typical application of dynamic analysis to aid compiler optimization, performance analysis, program understanding, etc. In this paper, we empirically compare the results of nine Java dynamic call graph extractors quantitatively and qualitatively. ...
متن کامل